This is the R Notebook for the story “Avoidable mortality 32% higher in the West Midlands most deprived cities” published in Birmingham Eastside.
It provides with the statistical analysis for the story.
The data comes from the BBC Data Shared Unit, and the correlation analysis that I did for them can be found in the R Notebook called Death correlations.
rr wm <- wm %>% group_by(Local Authority) %>% fill(Deprivation)
rr length(which(is.na(wm)))
[1] 0
wm_persons_avoidable <- wm %>% filter(Sex == "Persons" & `Mortality
` == "Avoidable")
wm_women_avoidable <- wm %>% filter(Sex == "Females" & `Mortality
` == "Avoidable")
wm_men_avoidable <- wm %>% filter(Sex == "Males" & `Mortality
` == "Avoidable")
write.csv(wm_persons_avoidable, "wm_avoidable.csv")
wm_gender_avoidable <- wm %>% filter(`Mortality
` == "Avoidable" & Sex != "Persons")
write.csv(wm_gender_avoidable, "wm_gender_avoidable.csv")
rr summary(wm_persons_avoidable$AS Rate)
Min. 1st Qu. Median Mean 3rd Qu. Max.
173.9 192.0 219.8 223.3 241.2 295.8
rr sd(wm_persons_avoidable$AS Rate)
[1] 37.36339
rr cor.test(wm_persons_avoidable\(`AS Rate`, wm_persons_avoidable\)Deprivation)
Pearson's product-moment correlation
data: wm_persons_avoidable$`AS Rate` and wm_persons_avoidable$Deprivation
t = -7.4067, df = 28, p-value = 4.575e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.9078258 -0.6414475
sample estimates:
cor
-0.8136803
plot(wm_persons_avoidable$Deprivation,wm_persons_avoidable$`AS Rate`, main = "Avoidable mortality higher in deprived areas", xlab = "1 most deprived and 211 least deprived", ylab = "Avoidable death rate per 100,000 inhabitants")
abline(lm(wm_persons_avoidable$`AS Rate` ~ wm_persons_avoidable$Deprivation))
Who?
The national mean in England is 218 avoidable deaths per 100,000 similar to the West Midlands one. 50% of the local authorities are above that mean.
rr wm_persons_avoidable %>% select(Local Authority, AS Rate, Deprivation) %>% filter(AS Rate >= 241) %>% arrange(desc(AS Rate))
#Lower death rate
wm_persons_avoidable %>% select(`Local Authority`, `AS Rate`, Deprivation) %>% filter(`AS Rate` <= 191) %>% arrange(desc(`AS Rate`))
# Fourth quantile
wm_persons_avoidable %>% filter(`AS Rate` >= 241) %>% group_by(Region) %>% summarise(mean(`AS Rate`))
# 2-4 quantile
wm_persons_avoidable %>% filter(`AS Rate` < 241) %>% group_by(Region) %>% summarise(mean(`AS Rate`))
(272.1125-205.5909)/205.5909*100
[1] 32.3563
32% higher death rate in the most deprived quantile.
The random variation in the data is higher in small populations. The funnel plots are a useful way to control for that.
wm_funnelplot <- eng_all %>% filter(Region == "West Midlands")
wm_funnelplot <- wm_funnelplot %>% select(`Local Authority`, `AS Rate`, `All ages`)
wm_funnelplot$`AS Rate` <- wm_funnelplot$`AS Rate`/100000
wm_funnelplot$D_SE <- sqrt((wm_funnelplot$`AS Rate`*(1-wm_funnelplot$`AS Rate`)) / (wm_funnelplot$`All ages`))
colnames(wm_funnelplot) <- c("LA", "Death", "Pop", "SE_Death")
## common effect (fixed effect model) -- Mean
death.fem <- weighted.mean(wm_funnelplot$Death, 1/wm_funnelplot$SE_Death^2)
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
## lower and upper limits for 95% and 99.9% CI, based on FEM estimator
pop.seq <- seq(10000, max(wm_funnelplot$Pop), 10000)
ll95 <- death.fem - 1.96 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
ul95 <- death.fem + 1.96 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
ll999 <- death.fem - 3.29 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
ul999 <- death.fem + 3.29 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
CI <- data.frame(ll95, ul95, ll999, ul999, pop.seq, death.fem)
## draw plot
FP_WM_only <- ggplot(aes(x = Pop, y = Death), data = wm_funnelplot) +
geom_point(shape = 1, aes(text = paste(LA,"<br>","Avoidable death rate: ",(Death*100000))), size = 2) +
geom_line(aes(x = pop.seq, y = ll95, color = "steelblue3"), data = CI, color = "steelblue3") +
geom_line(aes(x = pop.seq, y = ul95, color = "steelblue3"), data = CI, color = "steelblue3")+
geom_line(aes(x = pop.seq, y = ll999, color = "steelblue1"), linetype = "dashed", data = CI, color = "steelblue1") +
geom_line(aes(x = pop.seq, y = ul999, color = "steelblue1"), linetype = "dashed", data = CI, color = "steelblue1") + geom_hline(aes(yintercept = death.fem, color = "olivedrab"), data = CI, color = "olivedrab") + xlim(0,1150000) + scale_y_continuous(labels = function(y) y*100000) + labs(title = "Avoidable deaths in the West Midlands", x = "Local authority population size", y = "Avoidable deaths per 100,000\n\n") + annotate("text",900000,.00390, label="99.9% limits", col="steelblue1", size=3.5, hjust=0)+ annotate("text",900000,.00372, label="95% limits", col="steelblue3", size=3.5, hjust=0) + annotate("text",900000,.00355, label="Regional mean", col="olivedrab", size=3.5, hjust=0) + theme_bw()
Ignoring unknown aesthetics: text
FP_WM_only
The blue lines are the CI 95% and 99.9% and the green line is the regional mean. The dots inside the lines are within the “expected” random variation, while the dots outside is more unlikely to be due to the random variation caused by the sample side.
I make the chart interactive to explore the information.
install.packages("plotly")
library(plotly)
chart <- ggplotly(FP_WM_only, tooltip = c("text"))
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
chart
#To share in plotly
Sys.setenv("plotly_username"="CarmenAguilar")
Sys.setenv("plotly_api_key"="Tdpsdo2VWILpCenOMOQi")
chart_link = api_create(chart, filename = "FunnelPlot")
Found a grid already named: 'FunnelPlot Grid'. Since fileopt='overwrite', I'll try to update it
Found a plot already named: 'FunnelPlot'. Since fileopt='overwrite', I'll try to update it
chart_link
The chart in plotly does not display properly. It does not respect the transformation in the y axis. As that information is in each data point, I removed it to avoid misundertandings.
?theme